# A tibble: 6 × 4
# Groups: poverty_status [2]
poverty_status cognitive_level count proportion
<chr> <fct> <int> <dbl>
1 Above Poverty Line Low Cognitive Performance 647 0.413
2 Above Poverty Line Medium Cognitive Performance 745 0.476
3 Above Poverty Line High Cognitive Performance 173 0.111
4 Below Poverty Line Low Cognitive Performance 487 0.441
5 Below Poverty Line Medium Cognitive Performance 501 0.453
6 Below Poverty Line High Cognitive Performance 117 0.106
Visualizing Association: Poverty and Cognitive Performance
Weak (no?) association: Less clear relationship between poverty and cognitive performance compared to poverty and savings
Requirement 2: Temporal Ordering
Cause Must Come Before Effect
Seems obvious, but it’s often unclear in real data
Examples where timing matters:
Poverty & Crime
flowchart LR
A[<b>Poverty</b>] --> B[<b>Crime</b>]
classDef startNode fill:#b51963,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
classDef endNode fill:#0073e6,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
class A startNode
class B endNode
linkStyle 0 stroke:#ffffff,stroke-width:3px
OR
flowchart LR
A[<b>Crime</b>] --> B[<b>Poverty</b>]
classDef startNode fill:#0073e6,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
classDef endNode fill:#b51963,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
class A startNode
class B endNode
linkStyle 0 stroke:#ffffff,stroke-width:3px
Knowledge & Voting
flowchart LR
A[<b>Political Knowledge</b>] --> B[<b>Voting</b>]
classDef startNode fill:#b51963,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
classDef endNode fill:#0073e6,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
class A startNode
class B endNode
linkStyle 0 stroke:#ffffff,stroke-width:3px
OR
flowchart LR
A[<b>Voting</b>] --> B[<b>Political Knowledge</b>]
classDef startNode fill:#0073e6,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
classDef endNode fill:#b51963,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
class A startNode
class B endNode
linkStyle 0 stroke:#ffffff,stroke-width:3px
Spending & Votes
flowchart LR
A[<b>Campaign Spending</b>] --> B[<b>Expected Victory</b>]
classDef startNode fill:#b51963,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
classDef endNode fill:#0073e6,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
class A startNode
class B endNode
linkStyle 0 stroke:#ffffff,stroke-width:3px
OR
flowchart LR
A[<b>Expected Victory</b>] --> B[<b>Campaign Donations</b>]
classDef startNode fill:#0073e6,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
classDef endNode fill:#b51963,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
class A startNode
class B endNode
linkStyle 0 stroke:#ffffff,stroke-width:3px
The Challenge of Simultaneous Measurement
# Most data is cross-sectional - measured at the same timepoverty %>%select(treatment, income_less20k, accts_amt, stroop_time, cash) %>%head(5)
# A tibble: 5 × 5
treatment income_less20k accts_amt stroop_time cash
<chr> <dbl> <dbl> <dbl> <dbl>
1 Before Payday 0 3000 7.61 30
2 Before Payday 1 800 7.27 75
3 After Payday 0 15000 7.71 110
4 After Payday 0 NA 7.35 160
5 Before Payday 0 40000 7.37 40
Problem: We see poverty, savings, cognitive performance, and cash measured simultaneously. We can’t tell which came first:
Did poverty lead to lower savings, or did lack of savings lead to poverty?
Did poverty affect cognitive performance, or did poor cognitive performance lead to poverty?
Did having more cash improve both savings and cognitive performance?
Cross-sectional data cannot establish temporal ordering
Solutions for Temporal Ordering
Longitudinal Data: Follow the same people over time
Natural Timing: Use events with clear before/after structure
Historical Records: Trace sequences of events (often not credible)
Example: To study if negative ads cause lower campaign donations:
Weak: Survey voters after election about ads and donations
Strong: Measure donations before/after negative ad campaigns begin
Requirement 3: No Confounding
The Biggest Challenge
Confounding occurs when: A third variable Z causes both X and Y, creating a spurious association
Death by ice cream
graph TD
X[<b>Ice Cream Sales</b>]
Y[<b>Violent Crime</b>]
Z[<b>Hot Weather</b>]
Z --> X
Z --> Y
X --> Y
classDef confounder fill:#b51963,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
classDef exposure fill:#0073e6,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
classDef outcome fill:#0073e6,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
class Z confounder
class X exposure
class Y outcome
linkStyle 0 stroke:#ffffff,stroke-width:3px
linkStyle 1 stroke:#ffffff,stroke-width:3px
linkStyle 2 stroke:#ffffff,stroke-width:3px
Hot weather causes both → spurious correlation
Death by 60 Minutes
graph TD
X[<b>Watching 60 Minutes</b>]
Y[<b>Death</b>]
Z[<b>Old Age</b>]
Z --> X
Z --> Y
X --> Y
classDef confounder fill:#b51963,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
classDef exposure fill:#0073e6,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
classDef outcome fill:#0073e6,stroke:#ffffff,stroke-width:3px,color:#ffffff,font-weight:bold
class Z confounder
class X exposure
class Y outcome
linkStyle 0 stroke:#ffffff,stroke-width:3px
linkStyle 1 stroke:#ffffff,stroke-width:3px
linkStyle 2 stroke:#ffffff,stroke-width:3px
Old age causes both → spurious correlation
Spurious Correlations
Spurious Correlation: Divorce and Margarine
Spurious Correlation: TSA and Nicolas Cage
Spurious Correlation: Air Quality and Orlando Bloom
Lesson: Always ask “What else could explain this relationship?”
Identifying Confounders
Common confounders in political science:
Socioeconomic status: Affects voting, education, health, political views
Geographic region: Affects culture, economics, political preferences
Age/generation: Affects technology use, political attitudes, life experiences
Media environment: Affects information, political knowledge, opinions
Always ask: “What else could explain both my cause and my effect?”
Modern Applications: Campaign Finance and Vote Share
A Contemporary Causal Question
Question: “Does campaign spending cause candidates to win more votes?”
The Association: Clear relationship between spending and vote share
# Load real campaign finance datacampaign_data <-read_csv("../../data/campaign.csv") %>%# Clean and prepare the datafilter(State=="NH") %>%# Convert to more interpretable unitsmutate(spending_thousands = total.raised.candidate /1000,vote_share = total.votes ,incumbent =ifelse(is.na(prev.elect), 0, 1) # Create incumbent indicator ) # Show the associationround(cor(campaign_data$spending_thousands, campaign_data$vote_share, use ="complete.obs"),2)
[1] 0.38
The correlation coefficient measures the strength and direction of a linear relationship between two variables, ranging from −1 (perfect negative) to +1 (perfect positive).
Is This Confounded
# Investigate confounders: Do incumbents spend more AND get more votes?campaign_data %>%group_by(incumbent) %>%summarise(avg_spending =mean(spending_thousands, na.rm =TRUE),avg_vote_share =mean(vote_share, na.rm =TRUE),count =n(),.groups ="drop" ) %>%mutate(incumbent =ifelse(incumbent ==1, "Incumbent", "Challenger"),avg_spending =round(avg_spending, 1),avg_vote_share =round(avg_vote_share, 3) )
Reality: “Being incumbent causes both more spending and more votes”
Using AI to Identify Alternative Explanations
AI as Your Confounding Detective
Effective prompt for identifying confounders:
“I found that cities with more coffee shops have higher voter turnout. Before concluding that coffee shops increase civic engagement, help me brainstorm what other variables might cause both coffee shop density AND voter turnout. Think about demographics, economics, and culture.”
AI for Causal Critique
Research design critique prompt:
“Evaluate this causal claim: ‘Social media use reduces political knowledge because people who spend more time on social media score lower on political knowledge tests.’ What are the three requirements for causality, and does this study meet them? What alternative explanations should I consider?”
Common Mistakes in Causal Reasoning
Post Hoc, Ergo Propter Hoc
“After this, therefore because of this”
The Error: Assuming that because B followed A, that A caused B
Examples:
“I wore my lucky shirt and my team won”
“The economy improved after the new president took office”
“Crime dropped after we hired more police”
Remember: Temporal ordering is necessary but not sufficient for causation
Selection Bias Disguised as Causation
The Problem: Comparing groups that chose to be different
Examples:
“Private school students have higher test scores” (ignoring family background)
“People who exercise live longer” (ignoring health consciousness)
“Countries with democracy are more prosperous” (ignoring development level)
Reverse Causation
Getting the direction of causation backwards
Examples:
Does happiness cause success, or success cause happiness?
Do good policies cause economic growth, or economic growth enable good policies?
Best Practices for Causal Claims
Red Flags in Causal Arguments
Be skeptical when you see:
“Studies show X causes Y” (without mentioning study design)
Causal claims from cross-sectional data
No discussion of alternative explanations
Dramatic policy recommendations from single studies
Strengthening Causal Arguments
Use multiple lines of evidence:
Replicate findings across different populations
Test with different research designs
Look for natural experiments
Check for temporal consistency
Acknowledge limitations:
Be explicit about what you can and cannot conclude
Discuss alternative explanations
Admit when evidence is merely suggestive
In Our Next Class
Modern Causal Inference
The fundamental problem of causal inference
Average Treatment Effect (ATE) calculation
Randomized controlled trials as the gold standard
Difference-in-differences designs
Key Concepts to Remember
Correlation ≠ Causation - but it’s the first requirement
Three requirements: Association, temporal ordering, no confounding
Confounding is everywhere - always ask “what else could explain this?”
John Snow’s method: Map patterns, establish timing, rule out alternatives
AI can help - but only if you ask the right questions
Questions?
Key takeaway: Establishing causation is hard work. It requires careful thinking, good research design, and systematic investigation of alternative explanations.
Next week: We’ll learn about the most powerful tools for making causal claims in the modern social sciences.